首页> 外文OA文献 >Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

【2h】

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

机译：基于maTLaB的模型深度强化学习的神经网络动力学无模型微调

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Model-free deep reinforcement learning algorithms have been shown to becapable of learning a wide range of robotic skills, but typically require avery large number of samples to achieve good performance. Model-basedalgorithms, in principle, can provide for much more efficient learning, buthave proven difficult to extend to expressive, high-capacity models such asdeep neural networks. In this work, we demonstrate that medium-sized neuralnetwork models can in fact be combined with model predictive control (MPC) toachieve excellent sample complexity in a model-based reinforcement learningalgorithm, producing stable and plausible gaits to accomplish various complexlocomotion tasks. We also propose using deep neural network dynamics models toinitialize a model-free learner, in order to combine the sample efficiency ofmodel-based approaches with the high task-specific performance of model-freemethods. We empirically demonstrate on MuJoCo locomotion tasks that our puremodel-based approach trained on just random action data can follow arbitrarytrajectories with excellent sample efficiency, and that our hybrid algorithmcan accelerate model-free learning on high-speed benchmark tasks, achievingsample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents.Videos can be found at https://sites.google.com/view/mbmf

机译：事实证明，无模型的深度强化学习算法能够学习各种机器人技能，但通常需要大量样本才能获得良好的性能。原则上，基于模型的算法可以提供更有效的学习，但是事实证明，很难将其扩展到具有表达能力的高容量模型，例如深度神经网络。在这项工作中，我们证明了中型神经网络模型实际上可以与模型预测控制（MPC）相结合，从而在基于模型的强化学习算法中实现出色的样本复杂性，从而产生稳定而合理的步态来完成各种复杂的运动任务。我们还建议使用深度神经网络动力学模型来初始化无模型学习者，以将基于模型的方法的样本效率与无模型方法的高特定任务性能相结合。我们在MuJoCo运动任务上进行了经验证明，我们基于随机行为数据训练的基于纯模型的方法可以遵循任意轨迹，并具有出色的样本效率，并且我们的混合算法可以加快高速基准任务的无模型学习速度，使样本效率提高3-游泳者，猎豹，跳跃者和蚂蚁特工的5倍视频。可在https://sites.google.com/view/mbmf上找到视频。

著录项

作者
Nagabandi, Anusha; Kahn, Gregory; Fearing, Ronald S.; Levine, Sergey;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Intelligent Multi-Microgrid Energy Management Based on Deep Neural Network and Model-Free Reinforcement Learning [J] . Smart Grid, IEEE Transactions on . 2020,第2期

机译：基于深度神经网络和无模型强化学习的智能多微网能源管理
2. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. [J] . Glascher J, Daw N, Dayan P, Neuron . 2010,第4期

机译：状态与回报：基于模型和无模型的强化学习背后的可分离的神经预测错误信号。
3. Experienced Deep Reinforcement Learning With Generative Adversarial Networks (GANs) for Model-Free Ultra Reliable Low Latency Communication [J] . Kasgari Ali Taleb Zadeh, Saad Walid, Mozaffari Mohammad, IEEE Transactions on Communications . 2021,第2期

机译：经验丰富的深度加强学习，具有生成的对抗网络（GANS），用于无模型超可靠的低延迟通信
4. Model-Free Simulation and Fed-Batch Control of Cyanobacterial-Phycocyanin Production by Artificial Neural Network and Deep Reinforcement Learning(PPT) [C] . Yan Ma, Michael G.Benton, Jose A.Romagnoli AIChE Annual Meeting . 2019

机译：通过人工神经网络和深增强学习（PPT）的蓝藻 - 植物植物产生的无模型仿真和喂养分批控制
5. Dynamic tuning of PI-controllers based on model-free Reinforcement Learning methods. [D] . Abbasi Brujeni, Lena. 2010

机译：基于无模型强化学习方法的PI控制器的动态调整。
6. States versus Rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning [O] . Jan Gläscher, Nathaniel Daw, Peter Dayan, -1

机译：各种与奖励：可解离的神经预测误差信号底层模型和无模型加强学习
7. Task complexity interacts with state-space uncertainty in the arbitration process between model-based and model-free reinforcement-learning at both behavioral and neural levels [O] . Dongjae Kim, Geon Yeong Park, John P. O’Doherty, 2018

机译：任务复杂性与在行为和神经水平的模型和无模型加强学习之间的仲裁过程中的状态空间不确定性相互作用

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

摘要

著录项

相似文献

相关主题

期刊订阅